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Abstract 

Spatial heterogeneity is a haninark of living systems, even at the molecular scale in individual cells. A 
key example is the partitioning of membrane-bound proteins via lipid domain formation or cytoskeleton- 
induced corralling. Yet the impact of this spatial heterogeneity on biochemical signaling processes is 
poorly understood. Here we demonstrate that partitioning improves the reliability of biochemical sig- 
naling. We exactly solve a stochastic model describing a ubiquitous motif in membrane signaling. The 
solution reveals that partitioning improves signaling reliability via two effects: it moderates the non- 
linearity of the switching response, and it reduces noise in the response by suppressing correlations 
between molecules. An optimal partition size arises from a trade-off between minimizing the number 
of proteins per partition to improve signaling reliability and ensuring sufficient proteins per partition to 
maintain signal propagation. The predicted optimal partition size agrees quantitatively with experimen- 
tally observed systems. These results persist in spatial simulations with explicit diffusion barriers. Our 
findings suggest that molecular partitioning is not merely a consequence of the complexity of cellular 
substructures, but also plays an important functional role in cell signaling. 

The cell membrane is a nexus of information processing. Once regarded as a simple barrier between a 
cell and its surroundings, it is now clear that the membrane is a hotspot of molecular activity, where 
signals are integrated and modulated even before being relayed to the inside of the cell [1]. Moreover, 
the membrane itself is structurally complex. Regions enriched in glycosphingolipids, cholesterol, and other 
membrane components, often called lipid rafts, transiently assemble and float within the surrounding bilayer 
[2], providing platforms for molecular interaction [3]. Additionally, interaction of the membrane with the 
underlying actin cytoskeleton forms compartments in which molecules are transiently trapped [4,5]. These 
membrane sub-domains create a highly heterogeneous environment in which molecules axe far from well 
mixed, and it is currently unclear what effect this heterogeneity has on cell signaling. 

Membrane sub-domains are thought to play a dominant role in the observed aggregation of signaling 
molecules into clusters [6]. Interestingly, these clusters have a characteristic size of only a few molecules. 
For example, the GPI-anchored receptor CD59 is observed to form clusters of three to nine molecules upon 
interaction with the cytoskeleton and lipid rafts [7,8]. Similarly, the well-studied membrane-bound GTPase 
Ras forms clusters of six to eight molecules which also depend on interactions with the cytoskeleton and 
rafts [9, 10]. Despite the important findings that aggregation of proteins induced by sub-domains can affect 
reaction kinetics [11], enhance oligomerization [1], modulate downstream responses [12,13] and enhance sig- 
nal fidelity [13,14], the origin of this characteristic size remains unknown. While it is quite possible that these 
domains owe their size to a thermodynamic or structural origin, we here address the question of whether 
this size can be optimized for signaling performance. We find that the partitioning imposed by sub-domains 
gives rise to a trade-off in cell signaling, from which an optimal size of a few molecules emerges naturally, 
suggesting that reliable signaling is intimately tied to the spatial structure of the membrane. 

We study via stochastic analysis and spatial simulation a model that is directly motivated by both CD59 
and Ras signaling at the membrane. Stimulated CD59 receptors induce the switching of several Src-family 
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Figure 1: Schematic depiction of the model system. A We consider a model representative of signal detection 
by receptors and signal transmission at the cell membrane. B The model consists of two molecular species 
{X and y) which can each exist in active {X* , Y*) or inactive {X, Y) states. Molecules in the X state are 
activated by the external signal of strength a, and active X* molecules subsequently activate Y molecules. C 
We consider these reactions taking place in a single domain with all components well mixed, or in a domain 
consisting of smaller compartments which are each individually well mixed but between which no interaction 
is possible. The total system volumes in the two scenarios are equal and assumed to scale with the number 
of X molecules. 

kinases from an unphosphorylated to a phosphorylated state [7,8]. Similarly, stimulated EGF receptors 
induce the switching of Ras proteins from an inactive GDP-loaded state to an active GTP-loaded state [13]. 
We therefore study the simple and ubiquitous motif of coupled switching reactions, in which the activation 
of one species (the receptor) triggers the activation of a second species (the downstream effector) . 

We exactly solve this stochastic model of coupled switching reactions, and we use the solution to compare 
signaling reliability in a spatially-partitioned system to that in a well-mixed system. We demonstrate that 
partitioning can improve signaling performance by generating a more graded input-output relation and by 
reducing the noise in the signaling response. This latter effect comes about because partitioning reduces the 
correlations between the states of the different output molecules. On the other hand, the stochastic exchange 
of proteins between partitions can generate configurations which isolate molecules and exclude them from 
the signaling process, thereby reducing the dynamic range of the response and increasing the output noise. 
The trade-off between these two effects results in an optimal partition size that agrees well with cluster sizes 
of signaling proteins that are observed experimentally [7-10], suggesting that cluster sizes are tuned so as to 
maximize information transmission. 



1 Results 



We model two coupled molecular species at the membrane, as depicted in Fig. lA. A membrane-bound 
receptor (e.g. CD59 or EGF receptor) is activated via ligand stimulation, and the active receptor in turn 
activates a membrane-bound effector (e.g. a Src- family kinase or Ras). A reaction scheme representing these 
processes is shown in Fig. IB, and consists of two protein species: the receptor X and the downstream effector 
y. The switching of X molecules from the X to the X* state is driven by an external signal of strength a. 
Active X* molecules act on inactive Y molecules and promote switching to the Y* state. Deactivation of 
both active protein species occurs spontaneously and independently. 

We will be concerned with how the network response, the number of active Y* molecules as a function of 
the input signal a, is affected by the spatial structure of the system. In particular we ask how partitioning 
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of the reaction system into non-interacting snb-dornains affects the reliability of signal transmission, which 
is determined by two principal factors: the input-output response and the output noise; together these 
properties determine to what extent different input signals can be reliably resolved from the network response. 
We focus on two system configurations, shown in Fig. IC. In the first case we assume that all molecules are 
present in a single well-mixed reaction compartment. In the second case, we consider a system partitioned 
into TT compartments between which no interactions are permitted; here we take the output of the system 
to be the total number of active Y* molecules in all compartments. This choice of output corresponds to a 
readout of the Y* signal by, e.g., a cytosolic component whose diffusion is much faster than the diffusion and 
signaling of X and y on the membrane. In the partitioned system, we will for simplicity first assume that 
the molecules are uniformly and statically distributed among compartments. However, recognizing that this 
scenario will not generally be realized inside cells, we will later relax this assumption and consider exchange 
of molecules among partitions. 

We model the dynamics of the well-mixed system, as well as each compartment within the partitioned system, 
using a stochastic equation of the same form. We denote the total numbers of X and 3^ molecules by M and 
A^, respectively, and the numbers of active X* and Y* molecules by m and n, respectively. To parameterize 
the system, we scale units of time by the deactivation rate of X*, such that the effective deactivation rate is 
1. Then a denotes the rescaled activation rate of X; 7 is the rate of deactivation of Y* relative to that of X*; 
and 7/3m is the activation rate of a given Y molecule for a particular concentration of X* molecules. The 
parameter a incorporates the effective strength of the input signal and determines the mean X* activity via 
the occupancy q = (m) /M — a/(a + l). The precise m-dependence of the coupling function /3m will depend 
on the exact nature of the interactions between X* and Y molecules. We take oc m/v, with v the; volume 
of the compartment in which the reactions are taking place. However, our conclusions are unaffected if we 
instead take a Michaelis-Menten form /3m oc m/ (m -I- vK) (Appendix C: Fig. 7). The total system volume V 
is assumed to scale with the total number of X molecules, such that M/V is constant. The coupling function 
in partition z G {1, . . . , tt} is then determined by rrij, the number of X* molecules in partition i, according 
to Pm OC mj/(y/7r) = p-Kmi/M for constant (3. 

The probability of having m proteins in the X* state and n proteins in the Y* state evolves according to 
the chemical master equation (CME), 

Pmn-) (1) 

subject to suitable boundary conditions. The nature of the particular set of reactions in our model (Fig. IB) 
means that the operators and £„ have the same form, 

jCm{a, M)=a[l- E-i] (M - m) + [l - E+^j m, (2) 

where E5„/(m) = /(m -|- i) defines the step operator. Despite the appearance of terms containing the 
product mn in the operator £„(/?„, A^), which make the direct calculation of moments of Pmn from the 
CME impossible, an exact solution to (1) can be found for arbitrary (3m using the method of spectral 
expansion [15,16] as described in Appendix A.l. 



1.1 Partitioning leads to a more graded response 

We begin by analyzing the behavior of a minimal system with M = N = 2. In the well-mixed system, all 
molecules are contained within tt — 1 domain of volume V. In the partitioned system, tt = 2 subdomains 
with volume V/2 each contain one X and one y molecule. 

We first focus on the mean response (n). In the limits of small or large a the mean response is the same in 
both the partitioned and mixed systems, {n)/N and {n)/N [3/(j3 + 1) respectively. However, at all 
intermediate values of a, the mean response of the well-mixed system is larger than that of the partitioned 
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Figure 2: Spatial partitioning improves signaling performance. A The mean response {n)/N as a function 
of the mean X* activity q = {m) /M = a/ {a + 1), and B the output variance cr^ as a function of the mean 
response, plotted for a well-mixed system with M = N = 2 (thick solid) and a partitioned system of tt = 2 
compartments, each containing one X and one y molecule (thick dashed). Partitioning linearizes the output 
response and reduces noise across the full range of responses, leading to a higher transmitted information. 
The thin solid curves show the mean field response {n)/N — Pq/{f3q + 1) in A and the binomial noise 
limit (3) in B. Allowing exchange of molecules between compartments (thick dot-dashed) compresses the 
output response and increases the noise compared to the perfectly partitioned system, dramatically reducing 
information transmission. Here /3 — 20 and 7 = 1. 



system; equivalently, the partitioned system exhibits a more graded response than the well-mixed system 
to changes in the input signal (see Fig. 2A, thick solid and dashed curves). The more graded response is 
due to higher fluctuations in X* activity. When a — > or a — > oo, all A" molecules are inactive or active, 
respectively; however at intermediate values of a, the number of active X* molecules fluctuates. Partitioning 
reduces the number of X molecules per reaction compartment, increasing the relative size of these fluctuations 
according to a'^/iM/n)'^ — Trq{l — q)/M. These fluctuations are passed through the concave dependence 
of n on m, resulting in a smaller mean (via Jensen's inequality [17]), and therefore a more linear response 
curve (see also Appendix C: Fig. 8A). 

A more graded input-output relation can potentially enhance signaling by expanding the range of input 
signals which the network is able to transmit without saturating the response. However, in order to determine 
whether this larger input range can be resolved in the network it is crucial to examine how the noise in the 
response is affected. 



1.2 Partitioning reduces noise 



Figure 2B shows the variance of the output as a function of the mean response (n) for the system 
with M = iV = 2, as the input signal strength a is varied. We see that the output noise is reduced 
in the partitioned system relative to the well-mixed system across the full range of response levels. The 
noise reduction is surprising: one might expect that the increased fluctuations in X* activity that come with 
partitioning would propagate to fluctuations in Y* activity. Indeed, this is the case: in a single compartment, 
as the number of X molecules is reduced, the noise in the output increases (Appendix C: Fig. 8B). However, 
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Figure 3: Partitioning reduces correlations between output modules. A In the partitioned system, each y 
molecule receives an independent signal mi(t). The variance is simply that of independent two-state switches. 
B In the well- mixed system, each y molecule reacts to the same rn(t), which leads to correlations between 
in the states of different y molecules and an increase in the variance u^. Sample trajectories are generated 
using parameters as in Fig. 2, with a = 1. 



this effect is overcome by a second effect: partitioning reduces correlations among output molecules. 



To see the effect of partitioning on correlations, we consider the expressions for the variance. In the parti- 
tioned case, since the two y molecules switch independently, the variance of n is simply that of a pair of 
independent binomial switches with activation probability (n) /TV, 



N N \ N 



(3) 



In contrast, in the well- mixed case the two y molecules are not independent. Since both are driven by the 
same set of X molecules, fluctuations in /3m lead to correlations between the states of the two y molecules 
as their switching becomes more synchronized (see Fig. 3). This in turn leads to an increase in the variance, 
which can be written as 

N N \ N J N' ^ ^ 

where A is a correction term accounting for the correlation between y molecules, which are due to "extrinsic" 
fluctuations in the input m(t). The functional form of A for any M and N follows directly from the spectral 
solution of the CME (Appendix B: Eqn. 74); for M — N — 2 one finds by inspection that A is manifestly 
positive, meaning that correlations increase the noise across all values of the mean. Importantly, this effect 
is independent of the parameters of the switching reactions. 



The reduction of noise upon partitioning extends beyond the case of one y molecule per partition. Indeed the 
same phenomenon is observed if we consider larger molecule numbers M > n and N > n, and compare the 
well-mixed system to a system with uniform partitioning of the X and y molecules into the tt compartments. 
In the well-mixed case all y molecules respond to the same signal m{t), and hence are correlated with all 
other y molecules in the system. By contrast, in the partitioned case the N/tt > 1 3^ molecules within each 
partition are correlated, and indeed since the fluctuations in rriiit) will be larger than m{t) for the mixed 
system, such correlations will be stronger; yet, the y molecules in different partitions are uncorrelated. This 
latter effect is sufficient to overcome the increase in correlations within each partition, such that the total 
noise is reduced. 
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To sec the noise reduetion explicitly, we again consider the expression for the variance. Since the dynamics 
of different partitions is independent, assuming that both M and N are multiples of tt, the variance can be 
written as 

{n)(^ {n)\ , A(M,jV) 

where M = M/tt and N = N/tt are the numbers of X and y molecules per compartment, respectively. 
Here, as before, A(M, TV) represents the additional fluctuations due to correlations between the states of y 
molecules within each compartment. The A'^-dependence of A(M, N), which reflects the number of correlated 
pairs of y molecules, can be straightforwardly factored out as A{M,N) = N{N — l)A(M), where A(M) 
describes how strongly correlated are y molecules within each compartment. The exact form for A(M), 
while straightforward to calculate for a given M, is difficult to generalize for all M ; nonetheless, inspection 
of numerical and analytic results for specific combinations of M and N reveals in all cases that increasing tt 
leads to an overall reduction in a^. Additionally, if the switching of y molecules is much slower than that 
of X molecules, 7 <C 1, then A(M) takes the form 

A(M) « ^ (6) 

M(l + a + a/3)3 

Inserting this expression into (5) with M = M/tt and N = N/n, one can straightforwardly see that the 
variance is a decreasing function of tt for tt < N, indicating that the noise is reduced as the system is more 
finely partitioned. 



1.3 Pcirtitioning increases information transmission 



We have seen that partitioning has two beneficial effects on signal propagation: the input-output response 
becomes more graded, and the output noise at a given response level is reduced. Together, these effects 
mean that a larger number of distinct input signals can be encoded in the network response. To quantify 
the ability of the network to transmit signals we calculate the mutual information / [a, m] [18] between the 
input and the number of active Y* molecules, as described in Appendix A. 2. We find that indeed, in the 
case of M = N — 2 (Fig. 2), / [a, m] is significantly larger for the partitioned system (/ = 0.463 bits) than 
for the well-mixed system (/ = 0.332 bits), confirming that signal transmission is dramatically improved by 
partitioning. 



1.4 Exchange between peirtitions compromises signaling reliability 

Thus far we have considered only the perfectly uniform and stationary partitioning of molecules. In reality, 
physical transport processes such as diffusion will also give rise to a variety of configurations with different 
numbers of proteins in each compartment, as depicted in Fig. 4. Each of these configurations will have 
different properties for the transmission of the signal from a to n. It is therefore important to consider 
whether the benefits of partitioning described above persist once these additional configurations are taken 
into account. 

Single-molecule tracking experiments have revealed that the timescale of diffusive mixing within a com- 
partment (^100 lis) is two orders of magnitude faster than the timescale of molecular exchange between 
compartments (~10 ms) [19]. This observation allows us to treat each configuration as static on the timescale 
of mixing, then compute the total response by averaging over all configurations. Inherent in this treatment 
is the assumption that the timescale of signaling is also faster than that of exchange between compartments. 
We later relax this assumption using spatially resolved simulations and nonetheless find similar results. 
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Figure 4: Exchange between partitions leads to different configurations of tfie system witli a range of signaling 
performance. Multiplicities listed above each configuration are due to symmetry. Parameters are as in Fig. 
2. 



The total response is computed by first enumerating the possible configurations of M X molecules and 
N y molecules distributed amongst tt partitions. For each such configuration c we then solve for the 
output distribution p„ I c and combine these distributions, weighted by the probability Pc of each configuration 
occurring if molecules are randomly assigned to different partitions with uniform and independent probability, 
to give the overall response distribution p„ = 'Ylic'Pri\cPc- 

Figure 2B (dot-dashed curve) shows that the exchange of molecules between compartments increases the 
noise relative to the perfectly partitioned system considered previously when M = N = 2. This is because 
many of the alternative configurations generated by exchange lead to significant correlations between the 
states of the different y molecules. Nevertheless, we see that the noise remains lower than that of the well- 
mixed system, because of the existence of some configurations in which the y molecules are independent. 
However, the appearance of alternate configurations also affects the mean response (Fig. 2A); in particular, 
the appearance of configurations in which X and y molecules do not occupy the same partitions, and hence 
no signal can be propagated, means that the maximal output level is reduced. Given this simultaneous change 
in both the input-output function and the noise, it is not immediately clear whether signaling reliability is 
improved relative to the well-mixed system. Computing the mutual information, we see that the information 
transmitted by the system with exchange (/ = 0.213 bits) is significantly lower than that for the well-mixed 
system (/ — 0.332 bits), showing that the reduction of the output range compromises signal transmission to 
an extent which cannot be overcome by the corresponding reduction in noise. 

The decrease in information transmission upon incorporating molecule exchange in the system with M = 
= 2 is the result of the appearance of suboptimal protein configurations, for which signal propagation 
is compromised (or even impossible). However, the number and performance of such configurations will in 
general depend on the relative values of M, N and tt (which need not equal M or N). While molecule 
exchange may make partitioning unfavorable in the extreme case of M = 2, for systems with higher 

protein numbers it can be beneficial to partition the system into tt > 1 compartments, as we will see next. 



1.5 An optimal partition size 



To study the performance of systems with higher protein numbers and different partition sizes, we compare 
the information transmission, including molecule exchange, for different partition numbers vr as the number 
of proteins in the system is varied while holding M = N. Figure 5A shows that for il/ = A > 3 protein 
copies, systems with tt > 1 partition do indeed outperform the well-mixed system. Furthermore, as M — N is 
increased the optimal partition number also increases such that the optimal number of proteins per partition 
M/tt* = N/n* « 3 is roughly constant (Fig. 5B). This result is robust to variations in /3 and 7: changing 
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Figure 5: An optimal partition size. A For M = > 3 molecules, a system with tt > 1 partitions achieves 
higher information transmission that a well-mixed system {ir — 1). B As Af = A^ is increased the optimal 
partition number also increases such that the optimal number of proteins per partition M/tt* = N/n* « 3 
is roughly constant. Parameters are as in Fig. 2. 



each over several orders of magnitude results in optimal partition sizes in the range M/tt* = N/tt* ^ 1—10 
(Appendix C: Fig. 9A and B). The assumption of M = A^ is also not crucial for this result. In fact, we find 
that the value of M/tt* has only a weak dependence on A^ (Appendix C: Fig. 10). 

The optimal partition size arises from a trade-off between the reliability and efficiency of signaling. Increasing 
the number of partitions decreases the typical number of proteins per partition, which leads to the beneficial 
effects of a more graded response and reduced noise, increasing signaling reliability. On the other hand, due 
to molecule exchange, reducing the number of molecules per partition also increases the probability that any 
partition contains proteins of only one species that are therefore excluded from the signaling process, which 
leads to a reduced maximal response, reducing signaling efficiency. 

The optimal size revealed by our study of ~1— 10 molecules per species per partition shows good quantitative 
agreement with the observed aggregation of CD59 receptors (3—9 molecules [7,8]) and Ras proteins (6—8 
molecules [9, 10]), which each signal via the present motif and are known to interact with rafts and the 
cytoskelton. It is of further interest that a recent experiment in which T cell receptors were artificially par- 
titioned on supported membranes found that the minimum number of agonist-bound receptors per partition 
necessary for downstream signaling is approximately four [20]. 



1.6 An explicitly spatial model 



Lastly, we confirm that the effects observed in these minimal model systems, where the contents of each 
compartment are well-mixed and exchange can occur between any pair of compartments, persist in a more 
realistic model in which the diffusion of molecules in space is included explicitly. We simulate the diffusion 
and reaction of X and y molecules on a two-dimensional lattice, as described in Appendix A. 3. The system 
is partitioned into a number of subdomains by the introduction of diffusion barriers, which are crossed with 
a reduced probability phop relative to regular diffusion steps on the lattice. Results of such simulations are 
shown in Fig. 6. 

Figure 6A and B reveal that as the strength of the diffusion barriers is increased, the mean response becomes 
more graded, and the variance of Y* activity is reduced, analogous to the two effects observed in the 
minimal model system (Fig. 2). When Phop — Oi one molecule of each species is permanently confined to a 
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compartment, prochicing the; graded response predicted for the perfectly partitioned system (Fig. 6 A) and 
the associated minimal, binomial noise (Fig. 6B). Low but finite phop allows exchange of molecules between 
neighboring compartments but preserves a separation of timescales between intra- and inter-compartment 
mixing. This results in a graded mean response whose maximal level is reduced (Fig. 6A) and reduced 
noise (Fig. 6B), precisely the features observed in the minimal model of partitioning with exchange (Fig. 2). 
When phop = 1, there are no barriers, and the system approaches the well-mixed limit (CME). Interestingly, 
however, the response remains more graded and the noise remains lower than the predictions of the CME 
due to the finite speed of diffusion (Fig. 6 A and B), with agreement only reached when the ratio of diffusion 
to reaction propensities is much greater than one. This observation reveals that finite diffusion imposes 
an effective partitioning even when no actual partitions exist: molecules remain correlated with reaction 
partners within a typical distance set by diffusion, but uncorrelated with partners beyond this distance. 
As such, in the context of coupled reversible modification, we find that slower diffusion can linearize the 
response and reduce the noise, thereby improving information transmission.-'^ It is important to emphasize, 
however, that the extent of this effect is much smaller than for actual partitioning: Fig. 6B shows that finite 
diffusion reduces the maximal noise by (1.25 — 1)/1.25 = 20%, while strong partitioning (phop = 0.001) 
reduces the maximal noise by (1.25 — 0.4)/1.25 ~ 70%. Therefore, partitioning, which introduces not only a 
slower effective "hop" diffusion but also a separation of timescales between intra- and inter-compartmental 
mixing, is far more effective at conveying an information enhancement. 

Fig. 6C confirms that the transmitted information varies non-monotonically with the number of barriers 
in a fixed area, indicating that an optimal partition size also appears in systems where space is modeled 
explicitly. Like in the minimal model, this optimum persists with changes in /3 and 7, spanning the range 
of '^l— 10 molecules per partition (Appendix C: Fig. 9C and D). Fig. 6C also provides a measure of the 
scale of information transmitted by this motif. In absolute terms, the optimal information (1.35 bits) is 
consistent with values recently measured for signaling via the TNF-NF-kB pathway (^0.5— 1.5 bits) [22] and 
for patterning in the Drosophila embryo (1.5 ± 0.15 bits) [23]. In relative terms, we see that partitioning 
increases information over the unpartitioned system by (1.35 — 1.04)/1.04 w 30% (Fig. 6C) and decreases 
the maximal noise by (1 — 0.4)/l = 60% (Fig. 6B). Thus, in both absolute and relative terms, we see that 
partitioning plays a critical role in producing informative and reliable membrane signaling. 

As a final test, we use simulation to confirm that the effects of partitioning persist in the presence of 
features that are more realistic for signaling systems at the membrane, including extrinsic noise in the 
input (Appendix C: Fig. 11) and receptor dimerization (Appendix C: Fig. 12). The fact that the effects 
of partitioning, including the emergence of an optimal partition size, are robust to these details further 
underscores the generality of our findings. 



2 Discussion 



We have seen that the partitioning of a biochemical signaling system into a number of non-interacting subsys- 
tems improves the reliability of signaling via two effects. First, the non-linear response of the network means 
that a reduction in the number of input molecules translates into a more graded input-output response. 
Second, partitioning significantly reduces the noise in the response by eliminating correlations between the 
states of the different output molecules, an effect which, remarkably, overcomes the increase in noise associ- 
ated with fewer input molecules in each subsystem. On the other hand, we have seen that the introduction 
of diffusion or exchange of molecules between partitions enhances the variance and reduces the range of the 
response, thereby reducing signaling performance. This result is due to the presence of configurations in 

^Interestingly, this result is in marked contrast to the case of boundary establishment in embryonic development, where 
faster diffusion reduces noise within each nucleus by washing out bursts of gene expression in the input signal [21]. While 
in the present system faster diffusion will similarly reduce any supcr-Poissonian component of the noise within each partition 
individually, this averaging does not reduce the noise in the total output across all partitions. In fact, the latter noise is 
enhanced with faster diffusion by virtue of increased correlations between partitions. 
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Figure 6: The effects of partitioning persist in simulations with expUcit diffusion. As the probability of 
crossing a diffusion barrier phop is decreased, A the mean response becomes more graded, and B the output 
noise decreases. C The information transmission has a maximum as a function of the partition size. Here 
M = N = 49, f3 = 20, 7 = 1, the system is A = 70 lattice spacings squared, and the ratio of diffusion to 
reaction propensities is po/Pr = 1- In A, tt = 49; in B, phop = 0.001, and the partition size is varied by 
taking from 25 to 1. 



which the two species are isolated from one another, compromising or even arresting signal transmission in 
certain partitions. The interplay between these two effects leads to a partition size that optimizes informa- 
tion transmission, corresponding to a few molecules per partition on average, in quantitative agreement with 
experiments. These effects are generic, and hence the emergence of an optimal partition size is robust to the 
specific parameters of the model. Notably, the underlying mechanism revealed here, namely the removal of 
correlations, differs fundamentally from that based on cooperativity in protein activation, which has been 
argued to underlie optimal cluster size in sensory systems [24,25]. 

Reversible modification reactions are ubiquitous in cell signaling, and interactions with the cytoskeleton and 
lipids provide general mechanisms for the formation of subdomains. We therefore expect the results revealed 
by our study to be applicable to a wide class of signaling systems at the membrane. We have focused in this 
paper on coupled single-site modification reactions because this motif governs pathways specifically known 
to be affected by the formation of membrane sub-domains. However, the effects we uncover also pertain 
to multi-site modification reactions, which are very common in cell signaling [26-29]. Moreover, we have 
focused on systems where the reactant species are confined by a boundary which limits diffusion. However, 
similar effects could be observed in systems where proteins are localized to raft domains, or even scaffolds or 
large macromolecular complexes. In the latter case, each complex would effectively provide an independent 
reaction "compartment," and the exchange between compartments would be the result of rare dissociation 
events, after which proteins could diffuse rapidly through the cytoplasm to a different complex. Even if the 
signal within each complex was not mediated via diffusive encounters, but rather via cooperative or allosteric 
interactions, the fundamental mechanism that we reveal here - that partitioning into subsystems removes 
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correlations bctwccin subsystems it play. The presence of scaffolds and macromolecular complexes 

at early stages of signaling pathways is extremely common [30], suggesting that the effects discussed here 
are of wide biological relevance. 



3 Methods 



The CME (1) is solved using the method of spectral expansion [15,16]. Details of this method, the computa- 
tion of mutual information, and the spatial simulations are described in Appendix A. Source code, written 
in MATLAB, C++, and Mathematica, used to generate all results and figures in the main text and the SI 
Appendix is freely available at http://partitioning.sourceforge.net. 
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A Detailed methods 



A.l Spectral solution of the master equation 

The chemical master equation (CME) is solved using the method of spectral expansion [15,16], described 
in detail in Appendix B. Briefly, the structure of of the CME, in which the dynamics can be separated 
into two operators that act only on to or n but not both, allows for its solution to be written in the form 
Pmn{t) = J2jLoJ2k=o^jk{t-,i3)(Hn{'^)(t^nil^)^ where 0^(a) is the j*^ eigenvector of the operator and 
similarly for <?f>^(,5), and ^ is an expansion parameter on which Pmn does not ultimately depend. The 
expansion coefficients Gjk{t]P) can be calculated straightforwardly, as shown in Appendix B. Importantly, 
this spectral expansion dramatically decreases the computational complexity of calculating Pmn- rather than 
solving the (M + 1){N + 1) x (M + 1){N + 1) system of the original CME, it is only necessary to solve 
linear systems of size {M + 1) x (M + 1) for the vectors of coefficients Gk- We emphasize that since 
the system has a finite state-space, no approximations are made in using the spectral expansion, and the 
solution remains exact. Furthermore, the moments of the steady-state distribution pmn can be conveniently 
expressed in terms of the expansion coefficients Gjk; in particular, (n) = Gqi and (n^) = 2Go2 + Gqi- 



A. 2 Mutual information 

The mutual information between network input and response is given by the standard expression [18] 
/ [a, to] = {log{p{a, n) / [p(a)p{n)]}) , where the average is taken over the joint distribution p{a, n) = p{n\a)p{a), 
and p{n\a) = Y^^=oP{^i''^\'^) is given by the steady state of the CME. The calculation of the mutual in- 
formation requires specification of the distribution of input signals p{a). We choose A^^, values of a such 
that q = Q./{a + l) = {m)/M is uniformly-spaced over the range < q < 1; then p{n) = '^i'^iP{n\ai)p{ai) 
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and p{ai) = iV^ ^. However, our conclusions arc unaffected if we instead take a input distribution that is 
unimodal or bimodal (Fig. 13). We take > 30, for wliich / [a, m] converges to within 1% of its large- A^ct 
limit (Fig. 14). 



A. 3 Spatial simulations 

The diffusion and reactions of M X molecules and N y molecules are simulated on a two-dimensional square 
lattice of side length A using a fixed-time-step integration scheme. During each step of duration 5t, each 
particle is moved to a random neighboring lattice site with probability = {D/£'^)St, where D is the 
diffusion constant, and £ is the lattice spacing. Molecules have steric interactions on the lattice, such that 
only one molecule can be present at each lattice site at any time. Attempted moves to an occupied site are 
rejected, with the particle remaining at its original position. If a molecule in the X* state is adjacent to a 
molecule in the Y state, the latter is converted to the Y* state with probability pr ~ j{/3X'^ /M)5t. To make 
TT partitions, linear diffusion barriers are placed at iX/y/n in each direction, where i € {0, 1, . . . , s/tt — 1}. A 
diffusion step which crosses such a barrier is accepted with probability reduced by a factor Phop- The time 
step 6t is chosen sufficiently small that no probability exceeds one. 



B Solution of the master equation by spectral expansion 

This section describes the solution via the method of spectral expansion, or the 'spectral method', of the CME 
introduced in the main text. The spectral method has been used fruitfully in the context of gene regulation 
to solve CMEs describing cascades [15], bursts [16], and oscillations [31], and a pedagogical treatment is 
available in [32] . Here we apply the spectral method to coupled reversible switching. 

From Eqns. 1-2 of the main text, the stochastic dynamics of the system under study are given by the CME 

Pmn ~ (7) 

where both operators and £„ have the form 

jCm{a, M)=a[l- E-i] (M - m) + [l - m, (8) 

with E^/(m) = /(m -I- i) defining the step operator. The CME describes the evolution of the probability of 
having m X proteins in the active state and n y proteins in the active state, with the coupling function 
by which X drives the activation of y. 



B.l The moments do not close 

We first demonstrate that direct computation of the moments from the CME is not possible because the 
moments do not close. The reason is that a nonlinearity is present in the first term of Eqn. 8 in the form of 
the factor pmTi- As a result, the first moment depends on a higher moment, which in turn depends on an 
even higher moment, and so on. 

To see explicitly that the moments do not close, we consider computing the dynamics of the first moment of 
the driven species, the mean in) , by summing the CME over m and n against n. We obtain 

-dt{n) = -{n) + N{pm) - {M, (9) 
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where averages are taken over Pmn- We see that indeed the final term carries the nonlincarity. Even for tlic 
simplest coupling function, i.e. linear coupling = cm, one finds a hierarchy of moment dependencies that 
does not close: 

dt{n) = —^{n)+^cN{m)—^c{'mn), (10) 
dt{mn) = aM {n) — {a + ^ + l){mn) + jcN{m^) — ■^c{m'^n), (11) 
dt{m''n) = ... (12) 

That is, the dynamics of (n) depend on {mn), whose dynamics depend on {m^n), and so on. 



The fact that the moments cannot be computed — indeed, not even the mean output (n) - makes it particu- 
larly important to actually solve the CME in order to learn about the statistical properties of this system. 



B.2 The spectrum of the switch operator 

The CME is a linear equation. Even when the rates are nonlinear functions of the molecule numbers, the 
CME is still linear in its degree of freedom, the joint probability. The most straightforward way to solve a 
linear equation is to write its solution as an expansion in the cigenfunctions of the linear operator. Although it 
is difficult to derive the eigenfunctions of the coupled operator £m(a, M)+j£n{Pm, -^)) it is straightforward 
to derive the eigenfunctions of the uncoupled operator £m(a, M), which we call the switch operator. Indeed, 
we will see that expanding the joint probability in eigenfunctions of the uncoupled operator greatly simplifies 
the form of the CME, yielding an exact solution in terms of matrix algebra. 

The switch operator governs the CME for the first species X; explicitly, 

Pm = -Cpm = ct[M - (m - l)]p„i-i + (m + l)prn+i - \a{M - to) + to]p„, (13) 
where for notational simplicity we have taken Cm{o:, M) C. Its eigenvalue relation is written 

£</4 = Xj4Pm, (14) 

for eigenvalues \j and eigenvectors 



B.2.1 Eigenvalues 



The matrix form of the operator C can be read directly from Eqn. 13: 



L = 



/ Ma 
-Ma 



\ 



-1 

(M- l)a + 1 
-(M- l)a 



(M 



-2 
2)a 



2 -3 



-3a 



2a+{M- 2) 
-2a 



-(M-1) 

a + (M- 1) 



-M 
M 



(15) 



The tridiagonal structure follows from the fact that molecule numbers only increase or decrease by one 
at a time. Practically speaking, the eigenvalues can be obtained using the fact that the determinant of a 
tridiagonal matrix can be computed recursively. Performing the computation for M = 0, 1, 2, . . . reveals the 
pattern 



A, 



{a + l)j, J e {0,1,2,..., M}. 



(16) 



13 



However, Eqn. 16 can be derived more rigoroiisly by making use of a generating fimetion. We present 
this derivation next, since the generating function formaUsm will also prove quite useful in deriving the 
eigenvectors and solving the CME. 



The generating function is an expansion in any complete basis for which the probability distribution provides 
the expansion coefficients [33]. Choosing as our basis the set of polynomials in some continuous variable x, 
the generating function is defined 



M 

G{x) = J2 P"^""""- (17) 

m=0 



The probability distribution is recovered via the inverse transform 



Pm = ^dnG{x)l=o. (18) 

A key utility of the generating function is turning the CME, which is a set of ordinary differential equations 
(ODEs), into a single partial differential equation. Indeed, summing Eqn. 13 against a;™ yields 

G = -{x-l)[{ax + l)da;-aM]G, (19) 

where the appearances of a; and dx arise from the shifts m — 1 and m + 1, respectively. Eqn. 19 directly gives 
the form of the operator in x space: C = {x — l)[(ax + l)dx — aM]. The eigenfunctions are then obtained 
from the relation C(fy'{x) = \j(f>^{x) by separating variables and integrating: 

(fP{x) = (a + l)-*^(.T- 1)^^/("+i)(q.t + 1)^-^^/(«+i). (20) 

Here, the constant factor (a + 1)~^ is determined by application of the normalization condition G(l) = 1 
to the steady state solution, which is obtained by setting Xj = 0: 

ow = (^) . m, 

We will solve Eqn. 19 in two ways: by the method of characteristics and by expansion in the eigenfunctions; 
together these solutions will reveal the eigenvalues. 

First, the method of characteristics [34] posits that the dependence of G on a; and t occurs via some parametric 
variable s, i.e. G{x,t) = G[x{s),t{s)]. The chain rule then gives dG/ds = {dG/dx){dx/ds) + {dG/dt){dt/ds), 
which when compared term by term with Eqn. 19 yields three ordinary differential equations: 

dt ^ dx , dG 

— = 1, — = (a;- 1) aa; + l , — = aM a; - 1 . 22 
ds ds ds 

The first identifies s = t, with which the second is solved by 

.= ^^e-("+i)*, (23) 
aa; + 1 

where z is a constant of integration. The crux of the method is that Eqn. 23 defines a characteristic curve 
on which G must depend, i.e. G{x,t) = f[z{x,t)]g{x,t), where / and g are unknown functions, and z has 
been promoted to a characteristic function of x and t. The function g is identified by realizing that steady 
state is reached as t — )■ oo, for which f{z) /(O) no longer depends on x or t. Therefore, g must be the 
steady state function given in Eqn. 21: 

°<^-«=(f^i^)'"-f«- i^"' 

Although we still do not know /, we may Taylor expand it around the point z = 0, yielding 
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where Cj = di[f{z)]^=o/f- 



Second, because Eqn. 19 is linear, we may also write down its solution as an expansion in the eigenfunctions 
of its linear operator: 

G{x,t) = ^Cj{t)clP{x). (26) 

j 

Under the assumption that the eigenfunctions are orthogonal (which will be shown in the next section), 
inserting Eqn. 26 into Eqn. 19 yields an independent ODE for each Cj, Cj = —XjCj, which is solved by 
Cj{t) = Cje~^^* for initial conditions Cj. Inserting this functional form and that for (j>j{t) (Eqn. 20) into Eqn. 
26 yields 

°(-o^(^) i:". (^) --^ (^v 

Comparison of Eqns. 25 and 27 reveals both the expression for the eigenvalues, Xj = (a + l)j, and a limit 
on their domain, the nonnegative integers j € {0, 1,2,..., oo}. Of course, the domain can be a subset of 
the nonnegative integers; then some Cj in Eqn. 27 would be zero. Indeed, since L is a finite matrix of size 
Af + 1 by M + 1 (Eqn. 15), it is spanned by M + 1 linearly independent eigenvectors, meaning we expect only 
M + 1 eigenvalues. In fact, the only set of M + 1 nonnegative integers that satisfies the requirement that 
the trace of L, J2m=oii^ — m)a + m] = (a + 1)M(M + l)/2, equals the sum of the eigenvalues, J2j{^ + 
is j e {0, 1, 2, ... , M}. Thus, we arrive at the result 

A, = (a + l)j, j e {0,1,2,..., M}, (28) 

as proposed by inspection in Eqn. 16. 



B.2.2 State space notation 



The linear algebraic manipulations we have done thus far can be cast in the more abstract notation of state 
spaces, commonly used in quantum mechanics [35]. We will find this notation useful in later sections, for 
example in transforming between the molecule number basis and the eigenbasis. Specifically, we introduce 
a state \p) that can be projected into {m\ space to give the probability distribution, or into space to give 
the generating function: 

{m\p)=pm, {x\p)=G{x). (29) 

In the same way, the jth cigenstate \j) is projected into (m| space to give the jth eigenvector, or into (a;| 
space to give the jth eigenf unction: 

(m|i)=C, {x\j)=cl^{x). (30) 

This notation offers new insight into our definition of the generating function. For example, Eqn. 17 can 
now be written 

M 

{x\p) = ^{x\m){m\p), (31) 

m=0 

where we have recognized 

{x\m) = x"" (32) 

as the projection of the state \m) into (x| space. Eqn. 31 has a clear interpretation: we have inserted a 
complete set of |m) states. Similarly, Eqn. 18 can now be written 

{m\p) = (fdx^^:^ = Idx{m\x){x\p). (33) 
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In the first stop, wo have rewritten Eqn. 18 using Canchy's theorem, where dx = dx/2TTi, and the contour 
surrounds the pole at x = 0. In the second step, we have recognized 



(^1^) = ^ (34) 

as the conjugate to {x\m). Eqn. 33 has the clear interpretation of inserting a complete set of \x) states, under 
an inner product defined by the complex integration. The choice of inner product and of conjugate state 
are made such that orthonormality is preserved, a fact which we may confirm by again employing Cauchy's 
theorem: 

(m|m') = idx{m\x){x\m') = idx^^ = —B"^ \x"''] 9(m > 0) = Smm'- (35) 
J J 2;™+^ m! L J x=o 

Finally, the dynamics in Eqn. 19 can be written in state space as 

\p) = -t\p) = -(a+ - l)[{aa+ + l)a- - aM]\p), (36) 

where we have defined the operators a+ and a~ whose projections in x space are (a;|d+ = x and {x\d~ = dx- 
These are analogous to the raising and lowering operators in the well known treatment of the quantum 
harmonic oscillator. This operator formalism for the generating function was first developed in the 1970s; 
for a review see [36]. 



B.2.3 Eigenvectors 



The state space notation facilitates a derivation of the functional form of the eigenvectors: 

^ 1 (x-\)Hax + l)'^-^ 

dx- 



(jP^ = {m\j) = j>dx{m\x){x\j) 
Here we have inserted the eigenfunctions 

<t>j{x) = {x\3) = 



'"+1 (a + 1)^ 

{x-iy{ax + l)^-^ 



(37) 



(38) 



from Eqn. 20, with eigenvalues given by Eqn. 28. We use Cauchy's theorem to perform the integration and 
recognize that derivatives of a product follow a binomial expansion: 



1 



1 



[(ax + l)^^-^(x-l)^]_„ 



(a + l)^m! 



{a + iy 



1=0 



1 1 X ^ to! 

(a + i)M toJ {m-~e)m 



fM-j 



J 

m-i 



{M - j)\a^ 

{-ay. 



0{1<M~ j) 



(j - TO + £)! 



9{m-l< j) 



(39) 
(40) 

(41) 
(42) 



Here the domain $7 results from the derivatives and is defined by max(0, m — j) < £ < min(TO, M — j). Eqn. 
42 gives the expression for the eigenvectors. For j = the expression reduces to the binomial distribution 
in terms of the occupancy q = a/{a + l), as it must, since this is the steady state of the uncoupled process: 



M 



TO/ (a + 1) 



M 



M-m 



(43) 



This function has one maximum, and in general the jth eigenvector has j'+l extrema, making the eigenvectors 
qualitatively similar to Fourier modes or eigenfunctions of the quantum harmonic oscillator. 
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The switch operator C is not Hcrmitian. A consequence is that its conjugate eigenvectors ■f/'m = 01™) 
(row vectors) are not complex conjugates of its eigenvectors (pl^ = {m\j) (column vectors). Rather, they 
are distinct functions that must be constructed to obey an orthonormality relation in order to constitute a 
complete basis. The orthonormality relation can be used to derive their form in x space, il}^{x) = {j\x): 

= 01/) = fdx{j\x){x\j') = jdx = /^^o 4h{z,). (44) 

Here we have defined zq = {x— \)/{ax + 1) and fj{za) = t/j^ {x){ax + l)*^"'"^/(a + l)*^+i in order to draw an 
equivalence between Eqn. 44 and Eqn. 35, which then implies fj{zo) = l/^^o^^ = {ax + ly^^ /{x — 1)-'^^, or 

^'^"^= (a. + l)M-.+i(.-l).-+i - (45) 

Eqn. 45 gives the form of the conjugate eigenfunctions in x space, which can be used to derive the expression 
for the conjugate eigenvectors as in Eqns. 37-42: 

rm = {j\m) = jdx{j\x){x\m) =fdx ^^^ ^ l)M-j+^x _ (^6) 



Here is defined by max(0,j — m) < i < j. Eqn. 47 gives the expression for the conjugate eigenvectors. 
They are jth order polynomials in m. 



B.3 Expanding the coupled problem in uncoupled eigenfunctions 

We now solve the CME by expanding the solution in the eigenfunctions of the uncoupled operator. This 
procedure is most easily done in state space, in which the CME reads 

|p) = -[4(a)+74y]|p) (48) 

where 

4(a) = {a+-maat + l)a--aM], (49) 

l^y = {a+-l)[0,d+ + l)a--^,N], (50) 

as in Eqn. 36, and we have introduced the operator l3x whose action on the state |m) yields the coupling 
function, I3x\m) = /3„j|m). The first step is to write the full operator as two uncoupled operators plus 
a correction term. Introducing the constant (3 to parameterize the second uncoupled operator, the CME 
becomes 

\p) = -[4(a) + 74(^) + 74a,] |p) (51) 

where we have explicitly denoted the fact that the correction term Cxy — Ly{P) factorizes into two operators 
that act on each of the x and y sectors alone: 

4 = 4"^, (52) 
A, ^ {a+ -l){a+a- -N). (53) 

The second step is to expand the solution in the eigenfunctions of the two uncoupled operators. Introducing 
k as the mode index for the eigenstates of C,y(P), we write 

M N 

ip) = EEGifei/^)- (54) 

j=0 k=0 
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Inserting this form into the CME, projecting with the conjugate state and summing over j and k 

yields the dynamics for the expansion coefficients Gjk- 

M N 

Gjk = -[(« + l)j + + mGjk -lYl ^J^' 12 ^kk'Gj^k'. (55) 

j'=0 fe'=0 

Here the first term is diagonal and reflects the actions of the uncoupled operators on their eigenstates. The 

second term contains the corrections Tjjr = (jli^xlf) and A^k' = {t^\Ay\k'). The first correction is directly 
evaluated by inserting a complete set of m states: 

M M 



r.y = ^01(4-^)|m)(m|/)=E01"^)(^™-^)Hi') (56) 

m=0 m=0 
M 

= J^'^'mil^m-PWl (57) 
m=0 

We see that Tjj' is the simply the difference between the coupling function and the constant parameter, 
rotated into eigenspace. Notably, for linear coupling, Tjj' is tridiagonal (see Sec. B.5). The second correction 
is most easily evaluated by inserting a complete set of y states; the result, derived in Sec. B.5, is 

Akk' =kSkk'-{N-k + l)Sk-uk'- (58) 
We see that A^k' is subdiagonal in k, which simplifies the dynamics of Gjk to 

M M 

Gjk = -Y.^%'^rk +l{N-k + 1) ^r,yG,,,fe_i, (59) 

where we define the matrix acting on the diagonal part as 

A*^^-, = [{a + l)j + 7(^ + l)fc]%. + ^kTjj, . (60) 

The subdiagonality allows one to write the steady state of Eqn. 59 as an iterative scheme, by which the fcth 
column of Gjk is computed from the (fc — l)th column: 

Gk=l{N -k+l)A.l^TGk-i. (61) 

The scheme is initialized with 

Go = 5jo (62) 
(see Sec. B.5), and the joint distribution is recovered via 

M N 

Pmn^Y.T.GjkCttct>t (63) 
j=0 fe=0 

which is the projection of Eqn. 54 into (m, n| space. 



Eqn. 63 constitutes an exact steady state solution to the CME, with Gjk computed iteratively via Eqns. 
61 and 62, auxiliary matrices defined in Eqns. 57 and 60, and the eigenvectors given by Eqns. 42 and 47. 
Importantly, the computational complexity of the solution has been dramatically reduced: rather than solving 
the original CME (Eqn. 7), which requires inverting its operator of size (M + l)(iV + 1) x (M + l)(iV + 1), 
Eqn. 61 makes clear that it is only necessary to invert A'' smaller matrices of size (M + 1) x (M + 1), i.e. the 
matrices Afc for fc e {1, 2, . . . , N}. 
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B.4 Exact expressions for moments 



Now that we have an exact solution to the CME in terms of a spectral expansion, moments take an exact 
form in terms of the expansion coefficients. We thus circumvent the problem of moment closure, instead 
arriving at compact expressions that require only the inversion and multiplication of finite matrices via Eqn. 
61. 



Moments are most easily computed from the generating function, G{x,y). For example, the vth moment of 
the output is 

{nn^[{ydyrG{x^l,y)]^^,. (64) 

In terms of the expansion, the generating function is G(x,y) = {x,y\p) = J2jLoJ2k=o^Jk{^\j){y\f^) ^ ^^'^ 
using the fact that {x = = 6jo (Eqn. 38), we have 



N 



(65) 



fe=0 



Inserting the expression for {y\k) (Eqn. 38) and defining w = logy, we obtain 



N 



fc=0 



(e"" - l)'=(/3e"' + l)^-*^ 



(66) 



w=0 



At this point we recall that ^ is a constant we introduce to parameterize the expansion. The expression 
for the moments therefore cannot depend on ^: if we change the expression in brackets changes, but the 

expansion coefficients Gok also change, such that Eqn. 66 evaluates to the same ^-independent form. We 
are therefore free to set /3 to any value, and the choice ^ = makes the derivative easiest to evaluate. Thus 
we have 



AT 



(n'^) = ^Gofc5::[(e--l)'=]_„, 



fc=0 



where it is now understood that Gofc is computed with = 0. Evaluating the derivative yields 



(67) 



N 



Ok 



fc=0 

N 



e=i 



k\ 



min(fc,^') 
k=0 1=1 



i} {k-£)\ 



w=0 



{k 



he 



fc = l 



in terms of the Stirling numbers of the second kind, 

k 



For example, the first moment, second moment, and variance are 



(n) 



Goi + 2Go2, 

(n2)-(n)2 = Goi+2Go2-G^i. 



(68) 

(69) 
(70) 



(71) 



(72) 
(73) 
(74) 
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These are exact expressions for the; moments in terms of the expansion coefBcients Gofe, which are obtained 
by matrix inversion and multiphcation via Eqn. 61, e.g. in Mathematica. 

An informative special case is immediately revealed when N = 1, for which G02 does not exist, i.e. (n) = Gqi 
and al = Gqi - G^^, or 

al = {n){l-{n)) (iV = 1). (75) 

Here there is only one output molecule. The relationship between its mean activation and the associated 
noise must therefore obey the known result for a single binary switch, Eqn. 75. 



B.5 Auxiliciry calculations 



Here we show that Tjj/ is tridiagonal for linear = cm: 

Tjf = Olf,|/) 

= {j\{cdta- - 

= -B^jj' + c fdx{j\x){x\ata-\j') 



= -PSjj, +c ^dx{j\x)xdx{x\f) 

{x - ly' {ax + 1)^-^' 



— —PSjj'+cj) dx {j\x)xd; 

— —^Sjji+c(pdx{j\x) 



— —l3Sjj'+c(pdx{j\x} 



(a + 1)^ 

'j'{x-iy'-\ax + l)^-^' 
+{x - iy'{M - j'){ax + 

[j'{ax + l) + {x-l){M-f)a] 



(a + 1)^ 

x{x-iy'-\ax + l)'^-^'-'^ 



(76) 
(77) 

(78) 
(79) 

(80) 

(81) 
(82) 



a + 1 



^ ,„ ,x{x-iy'-^{ax + l)'^-i'-^ f.,, ^,0 



(a + 1) 



+ [a{M - /) + f]{ax +l){x- 1) 
+aiM-j'){x-lf} (83) 



-136^ 



33' 



a + l 



dx(j\x) 



{x-iy'-^{ax + l)^-^'+^ 

{a + l)M ^ 

{x-iy' {ax + l)'^"-^' 



+ 



(a+l) 



M 



(84) 



" ' a + l 

c 



(a + 1)^ 

dx{j\x) {{x\j' - + {x\j') [a{M - + {x\j' + l)a{M - j')} (85) 



= -PSjj' + — mf - + {j\j') HM - f) + f] + + l)a{M - j')} 

a + l- 



a + l 



^.id'-i 



c[a{M-j')+j'] 
a + l 



ca{M - f) 
a + l 



(86) 
(87) 



Eqn. 77 recognizes that 13^ — ca'^a~ is the operator representation of /3m (since a'^a~ is the number operator, 
i.e. a^a~\m) = m\m)), and Eqn. 83 uses the algebraic fact that x[j'{ax + 1) + (x — 1)(M — j')a]{a + 1) = 
f{ax + 1)^ + [a{M - j') + j']{ax + l){x - 1) + a(M - j'){x - if, which is straightforward to verify. 
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Here we derive Eqn. 58: 

Akk' = {k\Ay\k') 

= {k\{a+ -!){&+ a- -N)\k') 

= j>dy{k\y){y\{a+ - l)(a+a- - N)\k') 
= jdy{k\y){y-l){ydy-N){y\k') 



= <l>dy{k\y){y-l){ydy-N) 



= (f>dy{k\y) 



N 



{13 + ir 

yk'iy - if-'ipy + 1)^"'=' + y{y - lf{N - k')ipy + 



-Niy-lfiPy + 1)''-'''' 



Mm ^- — ' ;r..J [yk'ih + 1) + y{y - i)(iv - k')B 



(/3 + ir 



-N{y-l){h+l)\ 



Mk\y) Wih + i)-{y-i){N- k')] 



= f^y{k\y) 



if3 + l)N 



^dy{k\y) [fc'(y|fc')-(7V-fc')(y|fc' + l)] 

= k'{k\k')-{N -k'){k\k' + 1) 
= k'Skk' - {N - k')5k,k'+i 
= kSkk' - {N - k+l)Sk-i,k'- 



1) 



JV 



(88) 
(89) 

(90) 
(91) 

(92) 

(93) 

(94) 
(95) 

(96) 

(97) 

(98) 
(99) 
(100) 



Here we derive Eqn. 62: 



Go = Gjo 

= {j,k = 0\p) 



M N 



= X!^-?!"^)^^ = 0\n){m,n\', 

m—O n—0 

M N 

m—O n—0 
M 

m=0 

M 

= E 01 W ("1^ = 0) 

m=0 

= {j\j = o) 



(101) 
(102) 

(103) 
(104) 
(105) 

(106) 

(107) 
(108) 



Eqn. 104 uses Eqn. 47 to obtain (fc = 0|n) = 1, and Eqn. 106 recognizes that Pm is the steady state of the 
uncoupled operator, pm = 4>^ = (™b = 0)- 
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C Supplementary figures 





Figure 7: The effects of partitioning persist for Michaelis-Menten coupling. The couphng is described by 
p!^^ = (3m,/[m, + iV/7r)K] = (3m,/{mi + (jjM/n), where nii is the number of X* molecules in partition 
i G {1, . . . , tt}, and cj) = KV/M is a constant. Here /3 — 20, cj) — 1/2, and 7 = 1. 

A, B As in Fig. 2 of the main text, with AI — N — 2, perfect partitioning linearizes the input-output relation 
and reduces the noise, transmitting more information than the well-mixed system; further, allowing exchange 
among partitions compresses the response and increases the noise compared to the perfectly partitioned 
system, transmitting less information than the well-mixed system. 

C, D As in Fig. 5 of the main text, an information-optimal partition size, here M/tt* = N/tt* « 2, emerges 
due to the trade-off between optimizing signaling reliability and avoiding unfavorable configurations. 
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Figure 8: Reducing the number of input molecules linearizes the input-output response and increases the 
noise in the output. Here tt = 1, /? = 20, and 7 = 1. 

A The output (the mean activity of = 2 3^ molecules) vs. the input (the mean activity of AI X molecules) 
for several values of M . As M is reduced the response becomes more linear, deviating more strongly from 
the mean-field response {n)/N — f3q/{f3q + 1). Symbols show 20 uniformly spaced values of q to highlight 
the effect of saturation on the state space. 

B The noise vs. the mean for the output, shown for the same values of M. As M is reduced the noise 
increases for all values of the mean. 
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Figure 9: The emergence of an optimal partition size is robust to parameter variations. 

A, B Results from the minimal system, described by the chemical master equation, as in Fig. 5B of the main 
text. The information-optimal partition number n* is plotted as a function of molecule number M = N 
for various values of /3 (A) and 7 (B). Linear fits provide estimates of the optimal partition size M/tt*, as 
indicated in the legends. In A, 7 = 1; in B, /3 = 20. 

C, D Results from the lattice simulation, in which space is accounted for explicitly, as in Fig. 6C of the 
main text. The information is plotted as a function of the partition size, directly revealing an optimum, for 
various values of /3 (C) and 7 (D). Parameters are as in Fig. 6C: M = N = 49, phop = 0.001, A = 70, and 
p^/p^ = 1. In C, 7 = 1; in D, = 20. 

As discussed in the main text, the optimum arises due to a tradeoff between two key effects of partitioning: 
on the one hand, partitioning removes correlations in the states of y molecules, reducing noise; on the other 
hand, partitioning isolates molecules, reducing the maximal response. The first effect favors few molecules 
per partition, while the second effect favors many molecules per partition. 

As seen here in both the minimal system (A, B) and the simulated system (C, D), lowering /3 or 7 increases 
the optimal number of molecules per partition. This result has an intuitive explanation in terms of the above 
tradeoff: lowering either /3 or 7 slows the rate of switching from the Y to the Y* state, with respect to the 
timescale of X switching. As a result, y molecules are less sensitive to individual fluctuations in the state 
of X molecules. The states of the y molecules therefore exhibit weaker correlations, which in turn weakens 
the benefit that partitioning imparts in terms of the removal of these correlations. The opposing effect of 
molecular isolation thus begins to dominate, pushing the optimum toward a larger number of molecules per 
partition. 
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number of X molecules, M 



Figure 10: The optimal partition size has only a weak dependence on the number of output molecules. The 
information-optimal partition number tt* is plotted as a function of the number of X molecules M and the 
number of y molecules N . The dependence of tt* on N is weak, such that the partition size M/tt* « 3 — 4 
is roughly constant over the range of N values. 
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Figure 11; The effects of partitioning are robust to extrinsic noise. 

A Simulations are performed with extrinsic noise introduced to the input parameter a. To keep a > 0, the 
quantity z = log a is described by the simple mean-reverting Ornstein-Uhlenbeck process dz = r(fj, — z)dt -\- 
rj^rdt^,, where is a Gaussian random variable with mean and variance 1; this results in a log-normal 
distribution for a. The quantity 1/r is the autocorrelation time, and the choices = log[Q;'^/(Q: -I- c)]/2 and 
rj = ■\/2 log(l -|- c/a) ensure that the mean of a is a and that the variance of a scales with the mean via 



B As the magnitude of the extrinsic noise (set by c) increases, the information I[a, n] decreases for all 
partition sizes, while the presence of an information-optimal partition size persists. 

Here M = N — 25, /3 = 20, 7 = 1, phop = 0.001, the system is A = 50 lattice spacings squared, the ratio of 
diffusion to reaction propensities is po/Pr = 1, and r = 1 in units of the X* — )■ X reaction rate (which sets 
the timescale of switching). In A, a = c = 1 and time is scaled by 1/r. In B, when c = 0, the information 
transmission is lower than that in Fig. 6C of the main text because M = N is lower. 
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Figure 12: The effects of partitioning are robust to receptor dimerization. Two dimerization schemes are 
simulated, which are paradigmatic for receptor tyrosine kinases, including EGF receptor [37] : A Dimerization 
is receptor-mediated (left, inset), meaning two active receptors X* form a complex C, or B dimerization is 
ligand-mediated (left, inset), meaning an active receptor X* and an inactive receptor X form a complex C. 
The latter scheme admits a "dead-end" state at ligand saturation, when all receptors are ligand-bound and 
no complexes can form, leading to a non-monotonic response curve (B, left), as observed e.g. for the Ret 

receptor [38]. Both schemes are described by the reactions X ^ X* , C + Y — > C + Y* , and Y* — > Y , 
with dimcr formation described hy X* + X* ^ C \tl ov X* + X ^ C in B. Here M ^ 25, P = 20, 

X X 

X = 7 = 1, the system is A = 50 lattice spacings squared, and the ratio of diffusion to reaction propensities 
is pd IVr = 1- In A, e = 20; in B, e = 5. 

Left As in Fig. 6A of the main text, as the probability of crossing a diffusion barrier p^op is decreased, 
the maximal value of the mean response decreases. In A, the response also becomes more linear, but 
to less of a degree than in Fig. 6 A of the main text. Note that due to both finite diffusion and finite 
molecule number, even the unpartitioned response (phop = 1) deviates from the mean- field response (black 
solid line), which is given by {n)/N = (3f/{l + /3f), where / is the fraction of X molecules in the dimer 
state; in A, / = eg^ with g = {m)/M = {^/TTSecf - l)/(4eg), while in B, / = e.g(l - g)/{2eg + 1) with 
g = {m)/M = + &£q{l — q) — l]/[4e(l — q)]- Here vr = 25. Legends in middle panels apply to left panels 
as well. 

Middle As in Fig. 6B of the main text, as the probability of crossing a diffusion barrier p^op is decreased, 
the output noise decreases. Black dashed line shows the binomial noise limit cr^/iV = {{n)/N){l — {n)/N). 
In B, lines connecting data points are provided to reveal that, as there are two values of q that give the same 
mean {n)/N (left), the noise is higher for the smaller value of q. Here tt = 25. 

Right As in Fig. 6C of the main text, the tradeoff between reliable signaling (reduced noise) and efficient 
signaling (maintaining a high maximal response) leads to an information-optimal partition size. Here Phop — 
0.001. Here, the information transmission is lower than that in Fig. 6C of the main text because M = N is 
lower and additionally, in B, because of the non-monotonic mean response. 
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Figure 13: The effects of partitioning are robust to the shape of the input distribution. As in Fig. 5 of the 
main text, which takes a uniform input distribution an information-optimal partition size M/tt* = TV/tt* 
persists with an input distribution that is A unimodal or B bimodaL 
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Figure 14: Computation of the mutual information converges as the input is more finely discretized. The 
relative error |J — /o|//o, where Iq is the information at = 100, is plotted against the number of 
values of a [uniformly spaced in q = a/{a + 1)] used in the computation. Five conditions are tested, as 
indicated in the legend. It is seen that the relative error falls below ^1% in all conditions for Na > 30. 
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